Traffic measurements and P2P Q: How would you describe Internet traffic, based on the paper, and how sure are you? A1 -- type of traffic: P2P has grown substantially in very few years to be the dominant type of traffic; it used to be Web traffic. We don't really know since this paper, but presumably 2nd generation P2P systems, e.g., Bit Torrent, have taken off. We don't really know for locations other than UW either. A2 -- variation in time: There is a clear diurnal cycle (at least at UW), though it is offset between P2P and WWW. Presumably there is spread across the world too. But what about at finer timescales, how smooth is Internet traffic? Not as smooth as you'd suspect, which is the subject of an interesting diversion. Self-similarity: we would expect from the CLT that if we add up many distributions of traffic (Poisson = random is a common model) to get aggregate demands then they would be quite smooth around an expected mean rate. But they have bursts over a wide-range of timescales (say milliseconds to hours). This is self-similarity, discovered around 1993. A3 -- characteristics of transfers: P2P clearly has much larger transfers on average, by orders of magnitude. This is a consequence of the application -- applications drive networks! Both P2P and Web have highly skewed distributions of transfer lengths. For the Web both popularity and size of documents is Zipf (or power law) so caching is not very effective; it looks more effective for P2P. Zipf: Many natural phenomena (word frequency, city size) have a distribution where the nth element is weighted as 1/n, or more generally 1/n^k. k~=1 is a Zipf distribution, the mor general form is a power-law. These show up as a straight line on a log-log plot. They are common in networking and have implications for the overall system. In particular, the weight in the (unpopular) tail can dominate the weight of the popular items. This leads to strange situations, e.g., Web caching isn't very effective, as a small number of large, unpopular items blow the hit rate; pinning a few large flows can control the routes of most of the bytes, even though most flows are short. A4 -- characteristics on the wire: The skewed transfer sizes lead to a "mice and elephants" world, where most connections are short, but most of the bytes are in a few long flows. Q: Why do any of these characteristics matter? A: They have implications for the effective design of networks and content distribution systems. Caching was one (negative) result for the Web. Other ones are congestion control (how will it work if the average flow gets sent in less than one RTT?), network design (cable with better downloads than uploads is less appealing in a P2P world) and content distribution (how to deal with popular and unpopular objects).